41 research outputs found
EtiCor: Corpus for Analyzing LLMs for Etiquettes
Etiquettes are an essential ingredient of day-to-day interactions among
people. Moreover, etiquettes are region-specific, and etiquettes in one region
might contradict those in other regions. In this paper, we propose EtiCor, an
Etiquettes Corpus, having texts about social norms from five different regions
across the globe. The corpus provides a test bed for evaluating LLMs for
knowledge and understanding of region-specific etiquettes. Additionally, we
propose the task of Etiquette Sensitivity. We experiment with state-of-the-art
LLMs (Delphi, Falcon40B, and GPT-3.5). Initial results indicate that LLMs,
mostly fail to understand etiquettes from regions from non-Western world.Comment: Accepted at EMNLP 2023, Main Conferenc
Multi-Task Learning Framework for Extracting Emotion Cause Span and Entailment in Conversations
Predicting emotions expressed in text is a well-studied problem in the NLP
community. Recently there has been active research in extracting the cause of
an emotion expressed in text. Most of the previous work has done causal emotion
entailment in documents. In this work, we propose neural models to extract
emotion cause span and entailment in conversations. For learning such models,
we use RECCON dataset, which is annotated with cause spans at the utterance
level. In particular, we propose MuTEC, an end-to-end Multi-Task learning
framework for extracting emotions, emotion cause, and entailment in
conversations. This is in contrast to existing baseline models that use ground
truth emotions to extract the cause. MuTEC performs better than the baselines
for most of the data folds provided in the dataset.Comment: 19 Pages, Accepted at Workshop on Transfer Learning for Natural
Language Processing, NeurIPS 202
ISLTranslate: Dataset for Translating Indian Sign Language
Sign languages are the primary means of communication for many
hard-of-hearing people worldwide. Recently, to bridge the communication gap
between the hard-of-hearing community and the rest of the population, several
sign language translation datasets have been proposed to enable the development
of statistical sign language translation systems. However, there is a dearth of
sign language resources for the Indian sign language. This resource paper
introduces ISLTranslate, a translation dataset for continuous Indian Sign
Language (ISL) consisting of 31k ISL-English sentence/phrase pairs. To the best
of our knowledge, it is the largest translation dataset for continuous Indian
Sign Language. We provide a detailed analysis of the dataset. To validate the
performance of existing end-to-end Sign language to spoken language translation
systems, we benchmark the created dataset with a transformer-based model for
ISL translation.Comment: Accepted at ACL 2023 Findings, 8 Page